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Abstract 

Monotonicity is a simple yet significant quali- 
tative characteristic. We consider the problem of 
segmenting an array in up to K segments. We want 
segments to be as monotonic as possible and to al- 
ternate signs. We propose a quality metric for this 
problem, present an optimal linear time algorithm 
based on novel formalism, and compare experi- 
mentally its performance to a linear time top-down 
regression algorithm. We show that our algorithm 
is faster and more accurate. Applications include 
pattern recognition and qualitative modeling. 

1 Introduction 

Monotonicity is one of the most natural and im- 
portant qualitative properties for sequences of data 
points. It is easy to determine where the values 
are strictly going up or down, but we only want 
to identify significant monotonicity. For example, 
the drop from 2 to 1.9 in the array 0, 1,2, 1.9,3,4 
might not be significant and might even be noise- 
related. The quasi-monotonic segmentation prob- 
lem is to determine where the data is approxima- 
tively increasing or decreasing. 

We present a metric for the quasi-monotonic 
segmentation problem called the Optimal Mono- 
tonic Approximation Function Error (OMAFE); 
this metric differs from previously introduced OP- 
MAFE metric |2| since it applies to all segmenta- 
tions and not just "extremal" segmentations. We 
formalize the novel concept of a maximal *-pair 
and shows that it can be used to define a unique 
labelling of the extrema leading to an optimal seg- 
mentation algorithm. We also present an optimal 
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linear time algorithm to solve the quasi-monotonic 
segmentation problem given a segment budget to- 
gether with an experimental comparison to quantify 
the benefits of our algorithm. 

2 Monotonicity Error Metric 
(OMAFE) 

Suppose n samples noted F : D = {x. . . . ,x n } — > 
R with x\ < X2 < ...x n . We define, Fu^x as 
the restriction of F over Df) [a,b]. We seek the 
best monotonic (increasing or decreasing) func- 
tion / : K — > R approximating F, Let £2f (resp. 
i2j) be the set of all monotonic increasing (resp. 
decreasing) functions. The Optimal Monotonic 
Approximation Function Error (OMAFE) is 
min f £ Q max^go \f(x) — F (x) where £2 is either £2| 
or £2^. 

The segmentation of a set D is a sequence S = 
Xi,... ,Xk of intervals in R with [minD, maxD] = 
\JiXi such that maxX, = minX,+i 6 D and Xi n 
Xj = for j i + 1. Alternatively, we 

can define a segmentation from the set of points 
XiDXi+i = {y,+i}, yi = minX u and y K+l = 
maxXK- Given F : {x\,. . . ,x n } — > R and a seg- 
mentation, the Optimal Monotonic Approximation 
Function Error (OMAFE) of the segmentation is 
max; OMAFE (Fi Xj ) where the monotonicity type 
(increasing or decreasing) of the segment Xj is de- 
termined by the sign of F(maxXj) — F(minXj). 
Whenever F(maxX,) = F(minXj), we say the seg- 
ment has no direction and the best monotonic ap- 
proximation is just the flat function having value 
(maxF\ Xj — minf|x,)/2- The error is computed over 
each interval independently; optimal monotonic 
approximation functions are not required to agree 



at maxXj = minX I+ i. Segmentations should alter- 
nate between increasing and decreasing, otherwise 
sequences such as 0,2, 1,0,2 can be segmented as 
two increasing segments 0,2, 1 and 1,0,2: we con- 
sider it is natural to aggregate segments with the 
same monotonicity. 

We solve for the best monotonic function as fol- 
lows. If we seek the best monotonic increasing 
function, we first define /•» (x) = max{,F(;y) : y < x} 
(the maximum of all previous values) and / (x) = 
min{F(y) : y > x} (the minimum of all values to 
come). If we seek the best monotonic decreas- 
ing function, we define ft (x) — max{F(y) : y > x] 
(the maximum of all values to come) and /. (x) — 
min{F(y) : y < x] (the minimum of all previous 
values). These functions, which can be computed 
in linear time, are all we need to solve for the best 
approximation function as shown by the next theo- 
rem which is a well-known result ||5l . 

Theorem 1 Given F : D = {xi,...,x n } — * K, a 
best monotonic increasing approximation func- 
tion to F is /j = (/ j + f ^ ) /2 and a best mono- 
tonic decreasing approximation function is f\ = 
(/^ +/|)/2. The corresponding error (OMAFE) 

is max xGD (|/ T (x) -f (x)\)/2 or max xeD (\f ^x) - 
f. (x) |)/2 respectively. 

3 A Scale-Based Algorithm for Quasi- 
Monotonic Segmentation 

We use the following proposition to prove that 
the segmentations we generate are optimal (see 
Theorem|2|. 

Proposition 1 A segmentation y\,... ,yK+\ of F : 
D = {x\ , . . . ,x„} — > M with alternating monotonic- 
ity has a minimal OMAFE zfor a number of alter- 
nating segments K if 

A. F(y t ) = maxF(tyi-i,yi+i]) or Ffa) = 
miaF(\yi-uyi+i])fori = 2,... ,K; 

B. in all intervals [y/,3>j+i] for i = 1, . . . ,K, there 
exists zi,Z2 such that \F(z2) ~F(z\)\ > 2e. 

For simplicity, we assume F has no consecutive 
equal values, i.e. F(xj) ^ F(xi+i) for i=l,...,n — 
1 ; our algorithms assume all but one of consecutive 
equal values values have been removed. We say 
Xi is a maximum if i ^ 1 implies F(xi) > F(xi-{] 
and if i ^ n implies F(xi) > F(jc/+i). Minima are 
defined similarly. 

Our mathematical approach is based on the con- 
cept of 8-pair: 



Definition 1 The tuple x,y (x < y £ D) is a 8-pair 
(or a pair of scale 8) for F if\F(y) —F(x) \ > 8 and 
for all z S D, x < z < y implies \F(z) —F(x)\ < 
8 and \F(y) — F(z)\ < 8. A 8-pair's direction 
is increasing or decreasing according to whether 
F(y)>F(x) orF{y) <F(x). 

8-Pairs having opposite directions cannot overlap 
but they may share an end point. 8-Pairs of the 
same direction may overlap, but may not be nested. 
We use the term "*-pair" to indicate a 8-pair having 
an unspecified 8. We say that a *-pair is significant 
at scale 8 if it is of scale 8' for 8' > 8. 
We define 8-monotonicity as follows: 

Definition 2 Let X be an interval, F is 8- 
monotonic on X if all 8-pairs in X have the same 
direction; F is strictly 8-monotonic when there ex- 
ists at least one such 8-pair. In this case: 

• F is b-increasing on X if X contains an in- 
creasing 8-pair. 

• F is b-decreasing on X if X contains a de- 
creasing 8-pair. 

A 8-monotonic interval X satisfies 
OMAFE (X) < 8/2. We say that a *-pair x,y 
is maximal if whenever z\,Z2 is a *-pair of a 
larger scale in the same direction containing x,y, 
then there exists a *-pair w\,W2 of an opposite 
direction contained in z\,Zi and containing x,y. 
For example, the sequence 1,3,2,4 has 2 maximal 
*-pairs: 1,4 and 3,2. Maximal *-pairs of opposite 
direction may share a common point, whereas 
maximal *-pairs of the same direction may not. 
Maximal *-pairs cannot overlap, meaning that it 
cannot be the case that exactly one end point of a 
maximal *-pair lies strictly between the end points 
of another maximal *-pair; either neither point lies 
strictly between or both do. In the case that both 
do, we say that the one maximal *-pair properly 
contains the other. All *-pairs must be contained 
in a maximal *-pair. 

Lemma 1 The smallest maximal *-pair containing 
a *-pair must be of the same direction. 

Our approach is to label each extremum in F 
with a scale parameter 8 saying that this extremum 
is "significant" at scale 8 and below. Our intuition 
is that by picking extrema at scale 8, we should 
have a segmentation having error less than 8/2. 

Definition 3 The scale labelling of an extremum x 
is the maximum of the scales of the maximal *-pairs 
for which it is an end point. 



For example, given the sequence 1 , 3 , 2 , 4 with 2 
maximal *-pairs (1,4 and 3,2), we would give the 
following labels in order 3,1,1,3. 

Definition 4 Given 8 > 0, a maximal alternating 
sequence ofb-extrema Y =y\,, .y>K+i is a sequence 
of extrema each having scale label at least 8, 
having alternating types ( maximum/minimum), and 
such that there exists no sequence properly contain- 
ing Y having these same properties. From Y we de- 
fine a maximal alternating b-segmentation of D by 
segmenting at the points x\ ,j2 • ■ -yK^ n . 

Theorem 2 Given 8 > 0, let P = S\ . . .Sk be a 

maximal alternating b-segmentation derived from 
maximal alternating sequence y\...yK+\ of 8- 
extrema. Then any alternating segmentation Q hav- 
ing OMAFE(Q) < OMAFE(P) has at least K+\ 
segments. 

Sequences of extrema labelled at least 8 are 
generally not maximal alternating. For exam- 
ple the sequence 0,10,9,10,0 is scale labelled 
10, 10, 1, 10, 10. However, a simple relabelling of 
certain extrema can make them maximal alternat- 
ing. Consider two same-sense extrema zi < zi such 
that lying between them there exists no extremum 
having scale at least as large as the minimum of the 
two extrema's scales. We must have F(z\ ) — F(z2), 
since otherwise the point upon which F has the 
lesser value could not be the endpoint of a maxi- 
mal *-pair. This is the only situation which causes 
choice when constructing a maximal alternating se- 
quence of 8-extrema. To eliminate this choice, re- 
place the scale label on z\ with the largest scale of 
the opposite-sense extrema lying between them. 

3.1 Computing a Scale Labelling Efficiently 

Algorithm Q] (next page) produces a scale la- 
belling in linear time. Extrema from the original 
data are visited in order, and they alternate (max- 
ima/minima) since we only pick one of the values 
when there are repeated values (such as 1,1,1). 

The algorithm has a main loop (lines 5 to 12) 
where it labels extrema as it identifies extremal *- 
pairs, and stack the extrema it cannot immediately 
label. At all times, the stack (line 3) contains min- 
ima and maxima in strictly increasing and decreas- 
ing order respectively. Also at all times, the last 
two extrema at the bottom of the stack are the ab- 
solute maximum and absolute minimum (found so 
far). Observe that we can only label an extrema as 
we find new extremal *-pairs (lines 7, 10, and 14). 



• If the stack is empty or contains only one ex- 
tremum, we simply add the new extremum 
(line 12). 

• If there are only 2 extrema z\,Z2 in the stack 
and we found either a new absolute max- 
imum or new absolute minimum (23), we 
can pop and label the oldest one (zi) (lines 
9, 10, and 11) because the old pair (zi,Z2) 
forms a maximal *-pair and thus must be 
bounded by extrema having at least the same 
scale while the oldest value (zi) doesn't be- 
long to a larger maximal *-pair. Other- 
wise, if there are only 2 extrema z\,Z2 in the 
stack and the new extrema zi satisfies zi £ 
(min(zi,Z2),max(zi,Z2)), then we add it to the 
stack since no labelling is possible yet. 

• While the stack contains more than 2 extrema 
(lines 6, 7 and 8), we consider the last three 
points on the stack 0s3,S2,si) where s\ is the 
last point added. Let z be the value of the new 
extrema. If z G (min(si,S2),max(si,S2)), then 
it is simply added to the stack since we cannot 
yet label any of these points; we exit the while 
loop. Otherwise, we have a new maximum 
(resp. minimum) exceeding (resp. lower) or 
matching the previous one on stack, and hence 
si,S2 is a maximal *-pair. If zj^ so, then s^,z 
is a maximal *-pair and thus, S2 cannot be the 
end of a maximal *-pair and s\ cannot be the 
beginning of one, hence both S2 and s\ are la- 
belled. If z = S2 then we have successive max- 
ima or minima and the same labelling as z ^ S2 
applies. 

During the "unstacking" (lines 13 and following), 
we visit a sequence of minima and maxima forming 
increasingly larger maximal *-pairs. 

Once the labelling is complete, we find K + 2 
extrema having largest scale in time 0(nK) using 
0(K) memory, then we remove all extrema hav- 
ing the same scale as the smallest scale in these 
K + 2 extrema (removing at least one), we replace 
the first and the last extrema by and n — 1 respec- 
tively. The result is an optimal segmentation having 
at most K segments. 

4 Experimental Results and Compari- 
son to Top-Down Linear Spline 

We compare our optimal 0(nK) algorithm with 
the top-down linear spline algorithm [4 1 which runs 
in 0(nK 2 ) time. It successively segments the data 



Algorithm 1 Algorithm to compute the scale la- 
belling in 0(n) time. 
1: INPUT: an array d containing the y values in- 
dexed from to n — 1, repeated consecutive 
values have been removed 
2: OUTPUT: a scale labelling for all extrema 
3: S <— empty stack, First(5') is the value on top, 

Second^) is the second value 
4: define S(d,S) = \d Fkst{s) - rf S econd(s) I 
5: for e index of an extremum in d, e's are visited 

in increasing order do 
6: while length^) > 2 and (e is a minimum 
such that d e < Second(S) or e is a maximum 
such that d e > Second(S)) do 
7: label First(S) and Second(S) with 8(d, S) 
8: pop stack S twice 

9: if length(S) is 2 and (e is a minimum such 
that d e < Second(5') or e is a maximum such 
that d e > Second(S)) then 

10: label Second(S) with S(af, S) 

11: remove Second^) from stack S 

12: stack e to S 

13: while length of S > 2 do 

14: label First(S) with 5(d, S) 

15: pop stack S 

16: label First(S) and Second(S) with 5(d,S) 

starting with only one segment, each time picking 
the segment with the worse linear regression error 
and finding the best segmentation point; the linear 
regression is not continuous from one segment to 
the other. The regression error can be computed 
in constant time if one has precomputed the range 
moments 0. We run through the segments and ag- 
gregate consecutive segments having the same sign 
where the sign of a segment [y^jt+i] is defined by 
F(yk+i)-F(y k ). 

4.1 Data Source 

We used samples from the MIT-BIH Arrhyth- 
mia Database QJ. These ECG recordings used a 
sampling rate of 360 samples per second per chan- 
nel with 11-bit resolution. We keep 4000 samples 
(11 seconds) and about 14 pulses, and we do no 
preprocessing such as baseline correction. We can 
estimate that a typical pulse has about 5 "easily" 
identifiable monotonic segments. Hence, out of 14 
pulses, we can estimate that there are about 70 sig- 
nificant monotonic segments, some of which match 
the domain-specific markers (reference points P, Q, 
R, S, and T). A qualitative description of such data 
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Figure 1. Results from experiments 
over ECG data: the optimal algorithm 
is considerably more accurate. 

is useful for pattern matching applications. 
4.2 Results 

We implemented both the scale-based segmen- 
tation algorithm and the L2 norm top-down linear 
spline approximation algorithm in Python(version 
2.3). Each run was repeated 3 times and we ob- 
served that the scale-based segmentation imple- 
mentation is faster than the top-down linear spline 
approximation implementation by a factor of 10. 

The OMAFE with respect to the maximal num- 
ber of segments (K) is given in Fig.Q] By counting 
on about 5 monotonic segments per pulse with a to- 
tal of 14 pulses, there should about 70 monotonic 
segments in the 4000 samples under consideration. 
We see that the decrease in OMAFE with the addi- 
tion of new segments starts to level off between 50 
and 70 segments as predicted. 
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